Methods and apparatus for suppressing ambient noise using multiple audio signals

ABSTRACT

A method for suppressing ambient noise using multiple audio signals may include providing at least two audio signals captured by at least two electro-acoustic transducers. The at least two audio signals may include desired audio and ambient noise. The method may also include performing beamforming on the at least two audio signals in order to obtain a desired audio reference signal that is separate from a noise reference signal.

RELATED APPLICATIONS

This application is related to and claims priority from U.S. ProvisionalPatent Application Ser. No. 61/037,453, filed Mar. 18, 2008, for “WindGush Detection Using Multiple Microphones,” with inventors DineshRamakrishnan and Song Wang, which is incorporated herein by reference.

TECHNICAL FIELD

The present disclosure relates generally to signal processing. Morespecifically, the present disclosure relates to suppressing ambientnoise using multiple audio signals recorded using electro-transducerssuch as microphones.

BACKGROUND

Communication technologies continue to advance in many areas. As thesetechnologies advance, users have more flexibility in the ways they maycommunicate with one another. For telephone calls, users may engage indirect two-way calls or conference calls. In addition, headsets orspeakerphones may be used to enable hands-free operation. Calls may takeplace using standard telephones, cellular telephones, computing devices,etc.

This increased flexibility enabled by advancing communicationtechnologies also makes it possible for users to make calls from manydifferent kinds of environments. In some environments, variousconditions may arise that can affect the call. One condition is ambientnoise.

Ambient noise may degrade transmitted audio quality. In particular, itmay degrade transmitted speech quality. Hence, benefits may be realizedby providing improved methods and apparatus for suppressing ambientnoise.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is an illustration of a wireless communications device and anexample showing how voice audio and ambient noise may be received by thewireless communication device;

FIG. 2 a is a block diagram illustrating some aspects of one possibleconfiguration of a system including ambient noise suppression;

FIG. 2 b is a block diagram illustrating some aspects of anotherpossible configuration of a system including ambient noise suppression;

FIG. 3 a is a block diagram illustrating some aspects of one possibleconfiguration of a beamformer;

FIG. 3 b is a block diagram illustrating some aspects of anotherpossible configuration of a beamformer;

FIG. 3 c is a block diagram illustrating some aspects of anotherpossible configuration of a beamformer;

FIG. 4 a is a block diagram illustrating some aspects of one possibleconfiguration of a noise reference refiner;

FIG. 4 b is a block diagram illustrating some aspects of anotherpossible configuration of a noise reference refiner;

FIG. 5 a is a more detailed block diagram illustrating some aspects ofone possible configuration of a system including ambient noisesuppression;

FIG. 5 b is a more detailed block diagram illustrating some aspects ofanother possible configuration of a system including ambient noisesuppression;

FIG. 5 c illustrates an alternative configuration of a system includingambient noise suppression;

FIG. 5 d illustrates another alternative configuration of a systemincluding ambient noise suppression;

FIG. 6 a is a flow diagram illustrating one example of a method forsuppressing ambient noise;

FIG. 6 b is a flow diagram illustrating means-plus-function blockscorresponding to the method shown in FIG. 6 a;

FIG. 7 a is a block diagram illustrating some aspects of one possibleconfiguration of a system including ambient noise suppression;

FIG. 7 b is a block diagram illustrating some aspects of anotherpossible configuration of a system including ambient noise suppression;

FIG. 7 c is a block diagram illustrating some aspects of anotherpossible configuration of a system including ambient noise suppression;

FIG. 8 a is a block diagram illustrating some aspects of one possibleconfiguration of a calibrator;

FIG. 8 b is a block diagram illustrating some aspects of anotherpossible configuration of a calibrator;

FIG. 8 c is a block diagram illustrating some aspects of anotherpossible configuration of a calibrator;

FIG. 9 a is a block diagram illustrating some aspects of one possibleconfiguration of a noise reference calibrator;

FIG. 9 b is a block diagram illustrating some aspects of anotherpossible configuration of a noise reference calibrator;

FIG. 9 c is a block diagram illustrating some aspects of anotherpossible configuration of a noise reference calibrator;

FIG. 10 is a block diagram illustrating some aspects of one possibleconfiguration of a beamformer;

FIG. 11 is a block diagram illustrating some aspects of one possibleconfiguration of a post-processing block;

FIG. 12 is a flow diagram illustrating a method for suppressing ambientnoise;

FIG. 12 a illustrates means-plus-function blocks corresponding to themethod of FIG. 12; and

FIG. 13 is a block diagram illustrating various components that may beutilized in a communication device that may be used to implement themethods described herein.

DETAILED DESCRIPTION

A method for suppressing ambient noise using multiple audio signals isdisclosed. The method may include providing at least two audio signalsby at least two electro-acoustic transducers. The at least two audiosignals may include desired audio and ambient noise. The method may alsoinclude performing beamforming on the at least two audio signals inorder to obtain a desired audio reference signal that is separate from anoise reference signal. The method may also include refining the noisereference signal by removing residual desired audio from the noisereference signal, thereby obtaining a refined noise reference signal.

An apparatus for suppressing ambient noise using multiple audio signalsis disclosed. The apparatus may include at least two electro-acoustictransducers that provide at least two audio signals comprising desiredaudio and ambient noise. The apparatus may also include a beamformerthat performs beamforming on the at least two audio signals in order toobtain a desired audio reference signal that is separate from a noisereference signal. The apparatus may also include a noise referencerefiner that refines the noise reference signal by removing residualdesired audio from the noise reference signal, thereby obtaining arefined noise reference signal.

An apparatus for suppressing ambient noise using multiple audio signalsis disclosed. The apparatus may include means for providing at least twoaudio signals by at least two electro-acoustic transducers. The at leasttwo audio signals comprise desired audio and ambient noise. Theapparatus may also include means for performing beamforming on the atleast two audio signals in order to obtain a desired audio referencesignal that is separate from a noise reference signal. The apparatus mayfurther include means for refining the noise reference signal byremoving residual desired audio from the noise reference signal, therebyobtaining a refined noise reference signal.

A computer-program product for suppressing ambient noise using multipleaudio signals is disclosed. The computer-program product may include acomputer-readable medium having instructions thereon. The instructionsmay include code for providing at least two audio signals by at leasttwo electro-acoustic transducers. The at least two audio signals mayinclude desired audio and ambient noise. The instructions may alsoinclude code for performing beamforming on the at least two audiosignals in order to obtain a desired audio reference signal that isseparate from a noise reference signal. The instructions may alsoinclude code for refining the noise reference signal by removingresidual desired audio from the noise reference signal, therebyobtaining a refined noise reference signal.

Mobile communication devices increasingly employ multiple microphones toimprove transmitted voice quality in noisy scenarios. Multiplemicrophones may provide the capability to discriminate between desiredvoice and background noise and thus help improve the voice quality bysuppressing background noise in the audio signal. Discrimination ofvoice from noise may be particularly difficult if the microphones areplaced close to each other on the same side of the device. Methods andapparatus are presented for separating desired voice from noise in thesescenarios.

Voice quality is a major concern in mobile communication systems. Voicequality is highly affected by the presence of ambient noise during theusage of a mobile communication device. One solution for improving voicequality during noisy scenarios may be to equip the mobile device withmultiple microphones and use sophisticated signal processing techniquesto separate the desired voice from ambient noise. Particularly, mobiledevices may employ two microphones for suppressing the background noiseand improving voice quality. The two microphones may often be placedrelatively far apart. For example, one microphone may be placed on thefront side of the device and another microphone may be placed on theback side of the device, in order to exploit the diversity of acousticreception and provide for better discrimination of desired voice andbackground noise. However, for the ease of manufacturability andconsumer usage, it may be beneficial to place the two microphones closeto each other on the same side of the device. Many of the commonlyavailable signal processing solutions are incapable of handling thisclosely spaced microphone configuration and do not provide gooddiscrimination of desired voice and ambient noise. Hence, new methodsand apparatus for improving the voice quality of a mobile communicationdevice employing multiple microphones are disclosed. The proposedapproach may be applicable to a wide variety of closely spacedmicrophone configurations (typically less than 5 cm). However, it is notlimited to any particular value of microphone spacing.

Two closely spaced microphones on a mobile device may be exploited toimprove the quality of transmitted voice. In particular, beamformingtechniques may be used to discriminate desired audio (e.g., speech) fromambient noise and improve the audio quality by suppressing ambientnoise. Beamforming may separate the desired audio from ambient noise byforming a beam towards the desired speaker. It may also separate ambientnoise from the desired audio by forming a null beam in the direction ofthe desired audio. The beamformer output may or may not bepost-processed in order to further improve the quality of the audiooutput.

FIG. 1 is an illustration of a wireless communications device 102 and anexample showing how desired audio (e.g., speech 106) and ambient noise108 may be received by the wireless communication device 102. A wirelesscommunications device 102 may be used in an environment that may includeambient noise 108. Hence, the ambient noise 108 in addition to speech106 may be received by microphones 110 a, 110 b which may be housed in awireless communications device 102. The ambient noise 108 may degradethe quality of the speech 106 as transmitted by the wirelesscommunications device 102. Hence, benefits can be realized via methodsand apparatus capable of separating and suppressing the ambient noise108 from the speech 106. Although this example is given, the methods andapparatus disclosed herein can be utilized in any number ofconfigurations. For example, the methods and apparatus disclosed hereinmay be configured for use in a mobile phone, “land line” phone, wiredheadset, wireless headset (e.g. Bluetooth®), hearing aid, audio/videorecording device, and virtually any other device that utilizestransducers/microphones for receiving audio.

FIG. 2 a is a block diagram illustrating some aspects of one possibleconfiguration of a system 200 a including ambient noise suppression. Thesystem 200 a may include a beamformer 214 and/or a noise referencerefiner 220 a. The system 200 a may be configured to receive digitalaudio signals 212 a, 212 b. The digital audio signals 212 a, 212 b mayor may not have matching or similar energy levels. The digital audiosignals 212 a, 212 b, may be signals from two audio sources (e.g., themicrophones 110 a, 110 b in the device 102 shown in FIG. 1).

The digital audio signals 212 a, 212 b, may have matching or similarsignal characteristics. For example, both signals 212 a, 212 b mayinclude a desired audio signal (e.g., speech 106). The digital audiosignals 212 a, 212 b may also include ambient noise 108.

The digital audio signals 212 a, 212 b may be received by a beamformer214. One of the digital audio signals 212 a may also be routed to anoise reference refiner 220 a. The beamformer 214 may generate a desiredaudio reference signal 216 (e.g., a voice/speech reference signal). Thebeamformer 214 may generate a noise reference signal 218. The noisereference signal 218 may contain residual desired audio. The noisereference refiner 220 a may reduce or effectively eliminate the residualdesired audio from the noise reference signal 218 in order to generate arefined noise reference signal 222 a. The noise reference refiner 220 amay utilize one of the digital audio signals 212 a to generate a refinednoise reference signal 222 a. The desired audio reference signal 216 andthe refined noise reference signal 222 a may be utilized to improvedesired audio output. For example, the refined noise reference signal222 a may be filtered and subtracted from the desired audio referencesignal 216 in order to reduce noise in the desired audio. The refinednoise reference signal 222 a and the desired audio reference signal 216may also be further processed to reduce noise in the desired audio.

FIG. 2 b is another block diagram illustrating some aspects of anotherpossible configuration of a system 200 b including ambient noisesuppression. The system 200 b may include digital audio signals 212 a,212 b, a beamformer 214, a desired audio reference signal 216, a noisereference signal 218, a noise reference refiner 220 b, and a refinednoise reference signal 222 b. As the noise reference signal 218 mayinclude residual desired audio, the noise reference refiner 220 b mayreduce or effectively eliminate residual desired audio from the noisereference signal 218. The noise reference refiner 220 b may utilize bothdigital audio signals 212 a, 212 b in addition to the noise referencesignal 218 in order to generate a refined noise reference signal 222 b.The refined noise reference signal 222 b and the desired audio referencesignal 216 may be utilized in order to improve the desired audio.

FIG. 3 a is a block diagram illustrating some aspects of one possibleconfiguration of a beamformer 314 a. The primary purpose of thebeamformer 314 a may be to process digital audio signals 312 a, 312 band generate a desired audio reference signal 316 a and a noisereference signal 318 a. The noise reference signal 318 a may begenerated by forming a null beam towards the desired audio source (e.g.,the user) and suppressing the desired audio (e.g., the speech 106) fromthe digital audio signals 312 a, 312 b. The desired audio referencesignal 316 a may be generated by forming a beam towards the desiredaudio source and suppressing ambient noise 108 coming from otherdirections. The beamforming process may be performed through fixedbeamforming and/or adaptive beamforming. FIG. 3 a illustrates aconfiguration 300 a utilizing a fixed beamforming approach.

The beamformer 314 a may be configured to receive the digital audiosignals 312 a, 312 b. The digital audio signals 312 a, 312 b may or maynot be calibrated such that their energy levels are matched or similar.The digital audio signals 312 a, 312 b may be designated z_(cl)(n) andz_(c2)(n) respectively, where n is the digital audio sample number. Asimple form of fixed beamforming may be referred to as “broadside”beamforming. The desired audio reference signal 316 a may be designatedz_(b1)(n). For fixed “broadside” beamforming, the desired audioreference signal 316 a may be given by equation (1):

z _(b1)(n)=z _(c1)(n)+z _(c2)(n)   (1)

The noise reference signal 318 a may be designated z_(b2)(n). The noisereference signal 318 a may be given by equation (2):

z _(b2)(n)=z _(c1)(n)−z _(c2)(n)   (2)

In accordance with broadside beamforming, it is assumed that the desiredaudio source is equidistant to the two microphones (e.g., microphones110 a, 110 b). If the desired audio source is closer to one microphonethan the other, the desired audio signal captured by one microphone willsuffer a time delay compared to the desired audio signal captured by theother microphone. In this case, the performance of the fixed beamformercan be improved by compensating for the time delay difference betweenthe two microphone signals. Hence, the beamformer 314 a may include adelay compensation filter 324. The desired audio reference signal 316 aand the noise reference signal 318 a may be expressed in equations (3)and (4), respectively.

z _(b1)(n)=z _(c1)(n)+z _(c2)(n−τ)   (3)

z _(b2)(n)=z _(c1)(n)−z _(c2)(n−τ)   (4)

Here, τ may denote the time delay between the digital audio signals 312a, 312 b captured by the two microphones and may take either positive ornegative values. The time delay difference between the two microphonesignals may be calculated using any of the methods of time delaycomputation known in the art. The accuracy of time delay estimationmethods may be improved by computing the time delay estimates onlyduring desired audio activity periods.

The time delay τ may also take fractional values if the microphones arevery closely spaced (e.g., less than 4 cm). In this case, fractionaltime delay estimation techniques may be used to calculate τ. Fractionaltime delay compensation may be performed using a sinc filtering method.In this method, the calibrated microphone signal is convolved with adelayed sinc signal to perform fractional time delay compensation asshown in equation (5):

z _(c2)(n−τ)=z _(c2)(n)*sinc(n−τ)   (5)

A simple procedure for computing fractional time delay may involvesearching for the value τ that maximizes the cross-correlation betweenthe first digital audio signal 312 a (e.g., z_(c1)(n)) and the timedelay compensated second digital audio signal 312 b (e.g., z_(c2)(n)) asshown in equation (6):

$\begin{matrix}{{\tau (k)} = {\underset{\underset{\tau}{}}{\arg \; \max}{{\sum\limits_{n = {{({k - 1})}N}}^{kN}{{z_{c\; 1}(n)}{z_{c\; 2}\left( {n - \tau} \right)}}}}}} & (6)\end{matrix}$

Here, the digital audio signals 312 a, 312 b may be segmented intoframes where N is the number of samples per frame and k is the framenumber. The cross-correlation between the digital audio signals 312 a,312 b (e.g., z_(c1)(n) and z_(c2)(n)) may be computed for a variety ofvalues of τ. The time delay value for τ may be computed by finding thevalue of τ that maximizes the cross-correlation. This procedure mayprovide good results when the Signal-to-Noise Ratio (SNR) of the digitalaudio signals 312 a, 312 b is high.

FIG. 3 b is a block diagram illustrating some aspects of anotherpossible configuration of a beamformer 314 b. The fixed beamformingprocedure (as shown in FIG. 3 a) assumes that the frequency responses ofthe two microphones are well matched. There may be slight differences,however, between the frequency responses of the two microphones. Thebeamformer 314 b may utilize adaptive beamforming techniques. In thisprocedure, an adaptive filter 326 may be used to match the seconddigital audio signal 312 b with the first digital audio signal 312 a.That is, the adaptive filter 326 may match the frequency responses ofthe two microphones, as well as compensate for any delay between thedigital audio signals 312 a, 312 b. The second digital audio signal 312b may be used as the input to the adaptive filter 326, while the firstdigital audio signal 312 a may be used as the reference to the adaptivefilter 326. The filtered audio signal 328 may be designated z_(w2)(n).The noise reference (or “beamformed”) signal 318 b may be designatedz_(b2)(n). The weights for the adaptive filter 326 may be designatedw₁(i), where i is a number between zero and M−1, M being the length ofthe filter. The adaptive filtering process may be expressed as shown inequations (7) and (8):

$\begin{matrix}{{z_{w\; 2}(n)} = {\sum\limits_{i = 0}^{M - 1}{{w_{1}(i)}{z_{c\; 2}\left( {n - i} \right)}}}} & (7) \\{{z_{b\; 2}(n)} = {{z_{c\; 1}(n)} - {z_{w\; 2}(n)}}} & (8)\end{matrix}$

The adaptive filter weights w₁(i) may be adapted using any standardadaptive filtering algorithm such as Least Mean Squared (LMS) orNormalized LMS (NLMS), etc. The desired audio reference signal 316 b(e.g., z_(b1)(n)) and the noise reference signal 318 b (e.g., z_(b2)(n))may be expressed as shown in equations (9) and (10):

z _(b1)(n)=z _(c1)(n)+z _(w2)(n)   (9)

z _(b2)(n)=z _(c1)(n)−z _(w2)(n)   (10)

The adaptive beamforming procedure shown in FIG. 3 b may remove moredesired audio from the second digital audio signal 312 b and may producea better noise reference signal 318 b than the fixed beamformingtechnique shown in FIG. 3 a.

FIG. 3 c is a block diagram illustrating some aspects of anotherpossible configuration of a beamformer 314 c. The beamformer 314 c maybe applied only for the generation of a noise reference signal 318 c andthe first digital audio signal 312 a may be simply used as the desiredaudio reference signal 316 c (e.g., z_(b1)(n)=z_(c1)(n)). In certainscenarios, this method may prevent possible desired audio qualitydegradation such as reverberation effects caused by the beamformer 314c.

FIG. 4 a is a block diagram illustrating some aspects of one possibleconfiguration of a noise reference refiner 420 a. The noise referencesignal 418 generated by the beamformer (e.g., beamformers 214, 314 a-c)may still contain some residual desired audio and this may cause qualitydegradation at the output of the overall system. The purpose of thenoise reference refiner 420 a may be to remove further residual desiredaudio from the noise reference signal 418 (e.g., z_(b2)(n)).

Typically, if the microphones are not located very close to each other,the residual desired audio may have dominant high-frequency content.Thus, noise reference refining may be performed by removinghigh-frequency residual desired audio from the noise reference signal418. An adaptive filter 434 may be used for removing residual desiredaudio from the noise reference signal 418. The first digital audiosignal 412 a (e.g., z_(c1)(n)) may be (optionally) provided to ahigh-pass filter 430. In some cases, the high-pass filter 430 may beoptional. An IIR or FIR filter (e.g. h_(HPF)(n)) with a 1500-2000 Hzcutoff frequency may be used for high-pass filtering the first digitalaudio signal 412 a. The high-pass filter 430 may be utilized to aid inremoving only the high-frequency residual desired audio from the noisereference signal 418. The high-pass-filtered first digital audio signal432 a may be designated z_(i)(n). The adaptive filter output 436 a maybe designated z_(wr)(n). The adaptive filter weights (e.g., w_(r)(n))may be updated using any method known in the art such as LMS, NLMS, etc.The refined noise reference signal 422 a may be designated z_(br)(n).The noise reference refiner 420 a may be configured to implement a noisereference refining process as expressed in equations (11), (12), and(13):

$\begin{matrix}{{z_{i}(n)} = {{z_{c\; 1}(n)}*{h_{HPF}(n)}}} & (11) \\{{z_{wr}(n)} = {\sum\limits_{i = 0}^{M - 1}{{w_{r}(i)}{z_{i}\left( {n - i} \right)}}}} & (12) \\{{z_{br}(n)} = {{z_{b\; 2}(n)} - {z_{wr}(n)}}} & (13)\end{matrix}$

FIG. 4 b is a block diagram illustrating some aspects of anotherpossible configuration of a noise reference refiner 420 b. In thisconfiguration, the difference between digital audio signals 412 a, 412 b(e.g. z_(c1)(n), z_(c2)(n)) may be input into the optional high passfilter 430. The output 432 b of the high-pass filter 430 may bedesignated z_(i)(n). The output 436 b of the adaptive filter 434 may bedesignated z_(wr)(n). The refined noise reference signal 422 b may bedesignated z_(br)(n). The noise reference refiner 420 b may beconfigured to implement a noise reference refining process as expressedin equations (14), (15), and (16):

$\begin{matrix}{{z_{i}(n)} = {\left( {{z_{c\; 1}(n)} - {z_{c\; 2}(n)}} \right)*{h_{HPF}(n)}}} & (14) \\{{z_{wr}(n)} = {\sum\limits_{i = 0}^{M - 1}{{w_{r}(i)}{z_{i}\left( {n - i} \right)}}}} & (15) \\{{z_{br}(n)} = {{z_{b\; 2}(n)} - {z_{wr}(n)}}} & (16)\end{matrix}$

FIG. 5 a is a more detailed block diagram illustrating some aspects ofone possible configuration of a system 500 a including ambient noisesuppression. A beamformer 514 (including an adaptive filter 526) and anoise reference refiner 520 a (including a high-pass filter 530 and anadaptive filter 534) may receive digital audio signals 512 a, 512 b andoutput a desired audio reference signal 516 and a refined noisereference signal 522 a. In some cases, the high-pass filter 530 may beoptional.

FIG. 5 b is a more detailed block diagram illustrating some aspects ofanother possible configuration of a system 500 b including ambient noisesuppression. A beamformer 514 (including an adaptive filter 526) and anoise reference refiner 520 b (including a high-pass filter 530 and anadaptive filter 534) may receive digital audio signals 512 a, 512 b andoutput a desired audio reference signal 516 and a refined noisereference signal 522 b. In this configuration, the noise referencerefiner 520 b may input the difference between the first digital audiosignal 512 a and the second digital audio signal 512 b into the optionalhigh pass filter 530.

FIG. 5 c illustrates an alternative configuration of a system 500 cincluding ambient noise suppression. The system 500 c of FIG. 5 c issimilar to the system 500 b of FIG. 5 b, except that in the system 500 cof FIG. 5 c, the desired audio reference signal 516 is provided as inputto the high-pass filter 530 (instead of the difference between the firstdigital audio signal 512 a and the second digital audio signal 512 b).

FIG. 5 d illustrates another alternative configuration of a system 500 dincluding ambient noise suppression. The system 500 d of FIG. 5 d issimilar to the system 500 b of FIG. 5 b, except that in the system 500 dof FIG. 5 d, the output 512 a of the beamformer 514 is equal to thefirst digital audio signal 512 a.

FIG. 6 a is a flow diagram illustrating one example of a method 600 afor suppressing ambient noise. Digital audio from multiple sources isbeamformed 638 a. The digital audio from multiple sources may or may nothave matching or similar energy levels. The digital audio from multiplesources may have matching or similar signal characteristics. Forexample, the digital audio from each source may include a dominantspeech 106 and ambient noise 108. A desired audio reference signal(e.g., desired audio reference signal 216) and a noise reference signal(e.g., noise reference signal 218) may be generated via beamforming 638a. The noise reference signal may contain residual desired audio. Theresidual desired audio may be reduced or effectively eliminated from thenoise reference signal by refining 640 a the noise reference signal. Themethod 600 a shown may be an ongoing process.

The method 600 a described in FIG. 6 a above may be performed by varioushardware and/or software component(s) and/or module(s) corresponding tothe means-plus-function blocks 600 b illustrated in FIG. 6 b. In otherwords, blocks 638 a through 640 a illustrated in FIG. 6 a correspond tomeans-plus-function blocks 638 b through 640 b illustrated in FIG. 6 b.

FIG. 7 a is a block diagram illustrating some aspects of one possibleconfiguration of a system 700 a including ambient noise suppression. Asystem 700 a including ambient noise suppression may include transducers(e.g., microphones) 710 a, 710 b, Analog-to-Digital Converters (ADCs)744 a, 744 b, a calibrator 748, a first beamformer 714, a noisereference refiner 720, a noise reference calibrator 750, a secondbeamformer 754, and post processing components 760.

The transducers 710 a, 710 b may capture sound information and convertit to analog signals 742 a, 742 b. The transducers 710 a, 710 b mayinclude any device or devices used for converting sound information intoelectrical (or other) signals. For example, they may be electro-acoustictransducers such as microphones. The ADCs 744 a, 744 b, may convert theanalog signals 742 a, 742 b, captured by the transducers 710 a, 710 binto uncalibrated digital audio signals 746 a, 746 b. The ADCs 744 a,744 b may sample analog signals at a sampling frequency f_(s).

The two uncalibrated digital audio signals 746 a, 746 b may becalibrated by the calibrator 748 in order to compensate for differencesin microphone sensitivities and for differences in near-field speechlevels. The calibrated digital audio signals 712 a, 712 b, may beprocessed by the first beamformer 714 to provide a desired audioreference signal 716 and a noise reference signal 718. The firstbeamformer 714 may be a fixed beamformer or an adaptive beamformer. Thenoise reference refiner 720 may refine the noise reference signal 718 tofurther remove residual desired audio.

The refined noise reference signal 722 may also be calibrated by thenoise reference calibrator 750 in order to compensate for attenuationeffects caused by the first beamformer 714. The desired audio referencesignal 716 and the calibrated noise reference signal 752 may beprocessed by the second beamformer 754 to produce the second desiredaudio signal 756 and the second noise reference signal 758. The seconddesired audio signal 756 and the second noise reference signal 758 mayoptionally undergo post processing 760 to remove more residual noisefrom the second desired audio reference signal 756. The desired audiooutput signal 762 and the noise reference output signal 764 may betransmitted, output via a speaker, processed further, or otherwiseutilized.

FIG. 7 b is a block diagram illustrating some aspects of anotherpossible configuration of a system 700 b including ambient noisesuppression. A processor 766 may execute instructions and/or performoperations in order to implement the calibrator 748, first beamformer714, noise reference refiner 720, noise reference calibrator 750, secondbeamformer 754, and/or post processing 760.

FIG. 7 c is a block diagram illustrating some aspects of anotherpossible configuration of a system 700 c including ambient noisesuppression. A processor 766 a may execute instructions and/or performoperations in order to implement the calibrator 748 and first beamformer714. Another processor 766 b may execute instructions and/or performoperations in order to implement the noise reference refiner 720 andnoise reference calibrator 750. Another processor 766 c may executeinstructions and/or perform operations in order to implement the secondbeamformer 754 and post processing 760. Individual processors may bearranged to handle each block individually or any combination of blocks.

FIG. 8 a is a block diagram illustrating some aspects of one possibleconfiguration of a calibrator 848 a. The calibrator 848 a may serve twopurposes: to compensate for any difference in microphone sensitivities,and to compensate for the near-field desired audio level difference inthe uncalibrated digital audio signals 846 a, 846 b. Microphonesensitivity measures the strength of voltage generated by a microphonefor a given input pressure of the incident acoustic field. If twomicrophones have different sensitivities, they will produce differentvoltage levels for the same input pressure. This difference may becompensated before performing beamforming. A second factor that may beconsidered is the near-field effect. Since the user holding the mobiledevice may be in close proximity to the two microphones, any change inhandset orientation may result in significant differences between signallevels captured by the two microphones. Compensation of this signallevel difference may aid the first-stage beamformer in generating abetter noise reference signal.

The differences in microphone sensitivity and audio level (due to thenear-field effect) may be compensated by computing a set of calibrationfactors (which may also be referred to as scaling factors) and applyingthem to one or more uncalibrated digital audio signals 846 a, 846 b.

The calibration block 868 a may compute a calibration factor and applyit to one of the uncalibrated digital audio signals 846 a, 846 b so thatthe signal level in the second digital audio signal 812 b is close tothat of the first digital audio signal 812 a.

A variety of methods may be used for computing the appropriatecalibration factor. One approach for computing the calibration factormay be to compute the single tap Wiener filter coefficient and use it asthe calibration factor for the second uncalibrated digital audio signal846 b. The single tap Wiener filter coefficient may be computed bycalculating the cross-correlation between the two uncalibrated digitalaudio signals 846 a, 846 b, and the energy of the second uncalibrateddigital audio signal 846 b. The two uncalibrated digital audio signals846 a, 846 b may be designated z₁(n) and z₂(n) where n denotes the timeinstant or sample number. The uncalibrated digital audio signals 846 a,846 b may be segmented into frames (or blocks) of length N. For eachframe k, the block cross-correlation {circumflex over (R)}₁₂(k) andblock energy estimate {circumflex over (P)}₂₂(k) may be calculated asshown in equations (17) and (18):

$\begin{matrix}{{{\hat{R}}_{12}(k)} = {\sum\limits_{n = {{({k - 1})}N}}^{kN}{{z_{1}(n)}{z_{2}(n)}}}} & (17) \\{{{\hat{P}}_{22}(k)} = {\sum\limits_{n = {{({k - 1})}N}}^{kN}{{z_{2}(n)}{z_{2}(n)}}}} & (18)\end{matrix}$

The block cross-correlation {circumflex over (R)}₁₂(k) and block energyestimate {circumflex over (P)}₂₂(k) may be optionally smoothed using anexponential averaging method for minimizing the variance of theestimates as shown in equations (19) and (20):

R ₁₂(k)=λ₁ R ₁₂(k−1)+(1−λ₁){circumflex over (R)}₁₂(k)   (19)

P ₂₂(k)=λ₂ P ₂₂(k−1)+(1−λ₂){circumflex over (P)}₂₂(k)   (20)

λ₁ and λ₂ are averaging constants that may take values between 0 and 1.The higher the values of λ₁ and λ₂ are, the smoother the averagingprocess(es) will be, and the lower the variance of the estimates willbe. Typically, values in the range: 0.9-0.99 have been found to givegood results.

The calibration factor ĉ₂(k) for the second uncalibrated digital audiosignal 846 b may be found by computing the ratio of the blockcross-correlation estimate and the block energy estimate as shown inequation (21):

$\begin{matrix}{{{\hat{c}}_{2}(k)} = \frac{{\overset{\_}{R}}_{12}(k)}{{\overset{\_}{P}}_{22}(k)}} & (21)\end{matrix}$

The calibration factor ĉ₂(k) may be optionally smoothed in order tominimize abrupt variations, as shown in equation (22). The smoothingconstant may be chosen in the range: 0.7-0.9.

c ₂(k)=β₂ c ₂(k−1)+(1−β₂)ĉ ₂(k)   (22)

The estimate of the calibration factor may be improved by computing andupdating the calibration factor only during desired audio activityperiods. Any method of Voice Activity Detection (VAD) known in the artmay be used for this purpose.

The calibration factor may alternatively be estimated using a maximumsearching method. In this method, the block energy estimates {circumflexover (P)}₁₁(k) and {circumflex over (P)}₂₂(k) of the two uncalibrateddigital audio signals 846 a, 846 b may be searched for desired audioenergy maxima and the ratio of the two maxima may be used for computingthe calibration factor. The block energy estimates {circumflex over(P)}₁₁(k) and {circumflex over (P)}₂₂(k) may be computed as shown inequations (23) and (24):

$\begin{matrix}{{{\hat{P}}_{11}(k)} = {\sum\limits_{n = {{({k - 1})}N}}^{kN}{{z_{1}(n)}{z_{1}(n)}}}} & (23) \\{{{\hat{P}}_{22}(k)} = {\sum\limits_{n = {{({k - 1})}N}}^{kN}{{z_{2}(n)}{z_{2}(n)}}}} & (24)\end{matrix}$

The block energy estimates {circumflex over (P)}₁₁(k) and {circumflexover (P)}₂₂(k) may be optionally smoothed as shown in equations (25) and(26):

P ₁₁(k)=λ₃ P ₁₁(k−1)+(1−λ₃){circumflex over (P)} ₁₁(k)   (25)

P ₂₂(k)=λ₂ P ₂₂(k−1)+(1−λ₂){circumflex over (P)} ₂₂(k)   (26 )

λ₃ and λ₂ are averaging constants that may take values between 0 and 1.The higher the values of λ₃ and λ₂ are, the smoother the averagingprocess(es) will be, and the lower the variance of the estimates willbe. Typically, values in the range: 0.7-0.8 have been found to give goodresults. The desired audio maxima of the two uncalibrated digital audiosignals 846 a, 846 b (e.g., {circumflex over (Q)}₁(m) and {circumflexover (Q)}₂ (M) where m is the multiple frame index number) may becomputed by searching for the maximum of the block energy estimates overseveral frames, say K consecutive frames as shown in equations (27) and(28):

{circumflex over (Q)} ₁(m)=max{ P ₁₁((m−1)k), P ₁₁((m−1)k−1), . . . , P₁₁((m−1)k−K+1)}  (27)

{circumflex over (Q)} ₂(m)=max{ P ₂₂((m−1)k), P ₂₂((m−1), . . . , P₂₂((m−1)k−K+1)}  (28)

The maxima values may optionally be smoothed to obtain smootherestimates as shown in equations (29) and (30):

Q ₁(m)=λ₄ Q ₁(m−1)+(1−λ₄){circumflex over (Q)} ₁(m)   (29)

Q ₂(m)=λ₅ Q ₂(m−1)+(1−λ₅){circumflex over (Q)} ₂ (m)   (30)

λ₄ and λ₅ are averaging constants that may take values between 0 and 1.The higher the values of λ₄ and λ₅ are, the smoother the averagingprocess(es) will be, and the lower the variance of the estimates willbe. Typically, the values of averaging constants are chosen in therange: 0.5-0.7. The calibration factor for the second uncalibrateddigital audio signal 846 b may be estimated by computing the square rootof the ratio of the two uncalibrated digital audio signals 846 a, 846 bas shown in equation (31):

$\begin{matrix}{{{\hat{c}}_{2}(m)} = \sqrt{\frac{{\overset{\_}{Q}}_{1}(m)}{{\overset{\_}{Q}}_{2}(m)}}} & (31)\end{matrix}$

The calibration factor ĉ₂(m) may optionally be smoothed as shown inequation (32):

c ₂(m)=β₃ c ₂(m−1)+(1−β₃)ĉ ₂(m)   (32)

β₃ is an averaging constant that may take values between 0 and 1. Thehigher the value of β₃ is, the smoother the averaging process will be,and the lower the variance of the estimates will be. This smoothingprocess may minimize abrupt variation in the calibration factor for thesecond uncalibrated digital audio signal 846 b. The calibration factor,as calculated by the calibration block 868 a, may be used to multiplythe second uncalibrated digital audio signal 846 b. This process mayresult in scaling the second uncalibrated digital audio signal 846 bsuch that the desired audio energy levels in the digital audio signals812 a, 812 b are balanced before beamforming.

FIG. 8 b is a block diagram illustrating some aspects of anotherpossible configuration of a calibrator 848 b. In this configuration, theinverse of the calibration factor (as calculated by the calibrationblock 868 b) may be applied to the first uncalibrated digital audiosignal 846 a. This process may result in scaling the first uncalibrateddigital audio signal 846 a such that the desired audio energy levels inthe digital audio signals 812 a, 812 b are balanced before beamforming.

FIG. 8 c is a block diagram illustrating some aspects of anotherpossible configuration of a calibrator 848 c. In this configuration, twocalibration factors that will balance the desired audio energy levels inthe digital audio signals 812 a, 812 b may be calculated by thecalibration block 868 c. These two calibration factors may be applied tothe uncalibrated digital audio signals 846 a, 846 b.

Once the uncalibrated digital audio signals 846 a, 846 b are calibrated,the first digital audio signal 812 a and the second digital audio signal812 b may be beamformed and/or refined as discussed above.

FIG. 9 a is a block diagram illustrating some aspects of one possibleconfiguration of a noise reference calibrator 950 a. The noise referencesignal 922, which may be generated by the first beamformer 714, maysuffer from an attenuation problem. The strength of noise in the refinednoise reference signal 922 may be much smaller compared to the strengthof noise in the desired audio reference signal 916. The refined noisereference signal 922 may be calibrated (e.g., scaled) by the calibrationblock 972 a before performing secondary beamforming.

The calibration factor for the noise reference calibration may becomputed using noise floor estimates. The calibration block 972 a maycompute noise floor estimates for the desired audio reference signal 916and the refined noise reference signal 922. The calibration block 972 amay accordingly compute a calibration factor and apply it to the refinednoise reference signal 922.

The block energy estimates of the desired audio reference signal (e.g.,z_(b1)(n)) and the refined noise reference signal (e.g., z_(br)(n)) maybe designated P_(b1)(k) and P_(br)(k), respectively, where k is theframe index.

The noise floor estimates of the block energies (e.g., {circumflex over(Q)}_(b1)(m) and {circumflex over (Q)}_(br)(m) where m is the frameindex) may be computed by searching for a minimum value over a set offrames (e.g., K frames) as expressed in equations (33) and (34):

{circumflex over (Q)} _(b1)(m)=min{P _(b1)((m−1)k), P _(b1)((m−1)k−1), .. . , P _(b1)((m−1)k−K+1)}  (33)

{circumflex over (Q)} _(br)(m)=min{P _(br)((m−1)k), P _(br)((m−1)k−1), .. . , P _(br)((m−1)k−K+1)}  (34)

The noise floor estimates (e.g. {circumflex over (Q)}_(b1)(m) and{circumflex over (Q)}_(br)(m)) may optionally be smoothed (e.g., thesmoothed noise floor estimates may be designated Q _(b1)(m) and Q_(br)(m)) using an exponential averaging method as shown in equations(35) and (36):

Q _(b1)(m)=λ₆ Q _(b1)(m−1)+(1−λ₆){circumflex over (Q)} _(b1)(m)   (35)

Q _(br)(m)=λ₇ Q _(br)(m−1)+(1−λ₇){circumflex over (Q)} _(br)(m)   (36)

λ₆ and λ₇ are averaging constants that may take values between 0 and 1.The higher the values of λ₆ and λ₇ are, the smoother the averagingprocess(es) will be, and the lower the variance of the estimates willbe. The averaging constants are typically chosen in the range: 0.7-0.8.The refined noise reference 922 calibration factor may be designatedĉ_(nr)(m) and may be computed as expressed in equation (37):

$\begin{matrix}{{{\hat{c}}_{nr}(m)} = \frac{{\overset{\_}{Q}}_{b\; 1}(m)}{{\overset{\_}{Q}}_{br}(m)}} & (37)\end{matrix}$

The estimated calibration factor (e.g., ĉ_(nr)(m)) may be optionallysmoothed (e.g., resulting in c_(nr)(m)) to minimize discontinuities inthe calibrated noise reference signal 952 as expressed in equation (38):

c _(nr)(m)=β₄ c _(nr)(m−1)+(1−β₄)ĉ _(nr)(m)   (38)

β₄ is an averaging constant that may take values between 0 and 1. Thehigher the value of β₄ is, the smoother the averaging process will be,and the lower the variance of the estimates will be. Typically, theaveraging constant is chosen in the range: 0.7-0.8. The calibrated noisereference signal 952 may be designated z_(nf)(n).

FIG. 9 b is a block diagram illustrating some aspects of anotherpossible configuration of a noise reference calibrator 950 b. Therefined noise reference signal 922 may be divided into two (or more)sub-bands and a separate calibration factor may be computed by thecalibration block 972 b and applied for each sub-band. The low andhigh-frequency components of the refined noise reference signal 922 maybenefit from having different calibration values.

If the refined noise reference signal 922 is divided into two sub-bands,as shown in FIG. 9 b, the sub-bands may be filtered by a low-pass filter(LPF) 976 a and a high-pass filter (HPF) 978 a, respectively. If therefined noise reference signal 922 is divided into more than twosub-bands, then each sub-band may be filtered by a band-pass filter.

The calibration block 972 b may compute noise floor estimates for thedesired audio reference signal 916 and the sub-bands of the refinednoise reference signal 922. The calibration block 972 b may accordinglycompute calibration factors and apply them to the sub-bands of therefined noise reference signal 922. The block energy estimates of thedesired audio reference signal (e.g., z_(b1)(n)) and the sub-bands ofthe refined noise reference signal (e.g., z_(br)(n) ) may be designatedP_(b1)(k), P_(nLPF)(k), and P_(nHPF)(k) respectively, where k is theframe index. The noise floor estimates of the block energies (e.g.,{circumflex over (Q)}_(b1)(m), {circumflex over (Q)}_(nLPF)(m), and{circumflex over (Q)}_(nHPF)(m) where m is the frame index) may becomputed by searching for a minimum value over a set of frames (e.g., Kframes) as expressed in equations (39), (40), and (41):

{circumflex over (Q)} _(b1)(m)=min{P _(b1)((m−1)k), P _(b1)((m−1)k−1), .. . , P _(b1)((m−1)k−K+1)}  (39)

{circumflex over (Q)} _(nLPF)(m)=min{P _(nLPF)((m−1)k), P_(nLPF)((m−1)k−1), . . . , P _(nLPF)((m−1)k−K+1)}  (40)

{circumflex over (Q)} _(nHPF)(m)=min{P _(nHPF)((m−1)k), P_(nHPF)((m−1)k−1), . . . , P _(nHPF)((m−1)k−K+1)}  (41)

The noise floor estimates (e.g., {circumflex over (Q)}_(b1)b(m),{circumflex over (Q)}_(nLPF)(m), and {circumflex over (Q)}_(nHPF)(m))may optionally be smoothed (e.g., the smoothed noise floor estimates maybe designated Q _(b1)(m) Q _(nLPF)(m), and Q _(nHPF)(m)) using anexponential averaging method as shown in equations (42), (43), and (44):

Q _(b1)(m)=λ₆ Q _(b1)(m−1)+(1−λ₆){circumflex over (Q)} _(b1)(m)   (42)

Q _(nLPF)(m)=λ₈ Q _(nLPF)(m−1)+(1−λ₈){circumflex over (Q)} _(nLPF)(m)  (43)

Q _(nHPF)(m)=λ₉ Q _(nHPF)(m−1)+(1−λ₉){circumflex over (Q)} _(nHPF)(m)  (44)

λ₈ and λ₉ are averaging constants that may take values between 0 and 1.The higher the values of λ₈ and λ₉ are, the smoother the averagingprocess(es) will be, and the lower the variance of the estimates willbe. Typically, averaging constants in the range: 0.5-0.8 may be used.The refined noise reference 922 calibration factors may be designatedĉ_(1LPF)(m) and ĉ_(1HPF)(m) and may be computed as expressed inequations (45) and (46):

$\begin{matrix}{{{\hat{c}}_{1{LPF}}(m)} = \frac{{\overset{\_}{Q}}_{b\; 1}(m)}{{\overset{\_}{Q}}_{nLPF}(m)}} & (45) \\{{{\hat{c}}_{1{HPF}}(m)} = \frac{{\overset{\_}{Q}}_{b\; 1}(m)}{{\overset{\_}{Q}}_{nHPF}(m)}} & (46)\end{matrix}$

The estimated calibration factors may be optionally smoothed (e.g.,resulting in c_(1LPF)(m) and c_(1HPF)(m)) to minimize discontinuities inthe calibrated noise reference signal 952 b as expressed in equations(47) and (48):

c _(1LPF)(m)=β₅ c _(1LPF)(m−1)+(1−β₅)ĉ _(1LPF)(m)   (47)

c _(1HPF)(m)=β₆ c _(1HPF)(m−1)+(1−β₆)ĉ _(1HPF)(m)   (48)

β₅ and β₆ are averaging constants that may take values between 0 and 1.The higher the values of β₅ and β₆ are, the smoother the averagingprocess will be, and the lower the variance of the estimates will be.Typically, averaging constants in the range: 0.7-0.8 may be used. Thecalibrated noise reference signal 952 b may be the summation of the twoscaled sub-bands of the refined noise reference signal 922 and may bedesignated z_(nf)(n).

FIG. 9 c is a block diagram illustrating some aspects of anotherpossible configuration of a noise reference calibrator 950 c. Therefined noise reference signal 922 and the desired audio referencesignal 916 may be divided into two sub-bands and a separate calibrationfactor may be computed by the calibration block 972 c and applied foreach sub-band. The low and high-frequency components of the refinednoise reference signal 922 may benefit from different calibrationvalues.

The desired audio reference signal 916 may be divided and filtered by alow-pass filter 976 b and a high-pass filter 978 b. The refined noisereference signal 922 may be divided and filtered by a low-pass filter976 a and a high-pass filter 978 a. The calibration block 972 c maycompute noise floor estimates for the sub-bands of the desired audioreference signal 916 and the sub-bands of the refined noise referencesignal 922. The calibration block 972 c may accordingly computecalibration factors and apply them to the sub-bands of the refined noisereference signal 922. The block energy estimates of the sub-bands of thedesired audio reference signal (e.g., z_(b1)(n)) and the sub-bands ofthe refined noise reference signal (e.g., z_(br)(n)) may be designatedP_(LPF)(k), P_(HPF)(k), P_(nLPF)(k), and P_(nHPF)(k) respectively, wherek is the frame index. The noise floor estimates of the block energies(e.g., {circumflex over (Q)}_(LPF)(m), {circumflex over (Q)}_(HPF)(m),{circumflex over (Q)}_(nLPF)(m), and {circumflex over (Q)}_(nHPF)(m)where m is the frame index) may be computed by searching for a minimumvalue over a set of frames (e.g. K frames) as expressed in equations(49), (50), (51), and (52):

{circumflex over (Q)} _(LPF)(m)=min{P _(LPF)((m−1)k), P_(LPF)((m−1)k−1), . . . , P _(LPF)((m−1)k−K+1)}  (49)

{circumflex over (Q)} _(HPF)(m)=min{P _(HPF)((m−1)k), P_(HPF)((m−1)k−1), . . . , P _(HPF)((m−1)k−K+1)}  (50)

{circumflex over (Q)} _(nLPF)(m)=min{P _(nLPF)((m−1)k), P_(nLPF)((m−1)k−1), . . . , P _(nLPF)((m−1)k−K+1)}  (51)

{circumflex over (Q)} _(nHPF)(m)=min{P _(nHPF)((m−1)k), P_(nHPF)((m−1)k−1), . . . , P _(nHPF)((m−1)k−K+1)}  (52)

The noise floor estimates (e.g., {circumflex over (Q)}_(LPF)(m),{circumflex over (Q)}_(HPF)(m), {circumflex over (Q)}_(nLPF)(m), and{circumflex over (Q)}_(nHPF)(m)) may optionally be smoothed (e.g., thesmoothed noise floor estimates may be designated Q _(HPF)(m), Q_(LPF)(m), Q _(nLPF)(m), and Q _(nHPF)(m)) using an exponentialaveraging method as shown in equations (53), (54), (55), and (56):

Q _(LPF)(m)=λ₁₀ Q _(LPF)(m−1)+(1−λ₁₀){circumflex over (Q)} _(LPF)(m)  (53)

Q _(HPF)(m)=λ₁₁ Q _(HPF)(m−1)+(1−λ₁₁){circumflex over (Q)} _(HPF)(m)  (54)

Q _(nLPF)(m)=λ₈ Q _(nLPF)(m−1)+(1−λ₈){circumflex over (Q)} _(nLPF)(m)  (55)

Q _(nHPF)(m)=λ₉ {circumflex over (Q)} _(nHPF)(m−1)+(1−λ₉){circumflexover (Q)} _(nHPF)(m)   (56)

λ₁₀ and λ₁₁ are averaging constants that may take values between 0and 1. The higher the values of λ₁₀ and λ₁₁ are, the smoother theaveraging process(es) will be, and the lower the variance of theestimates will be. The averaging constants may be chosen in the range:0.5-0.8. The refined noise reference 922 calibration factors may bedesignated ĉ_(2LPF)(m) and ĉ_(2HPF)(m) and may be computed as expressedin equations (57) and (58):

$\begin{matrix}{{{\hat{c}}_{2{LPF}}(m)} = \frac{{\overset{\_}{Q}}_{LPF}(m)}{{\overset{\_}{Q}}_{nLPF}(m)}} & (57) \\{{{\hat{c}}_{2{HPF}}(m)} = \frac{{\overset{\_}{Q}}_{HPF}(m)}{{\overset{\_}{Q}}_{nHPF}(m)}} & (58)\end{matrix}$

The estimated calibration factors may be optionally smoothed (e.g.,resulting in c_(2LPF)(m) and c_(2HPF)(m)) to minimize discontinuities inthe calibrated noise reference signal 952 as expressed in equations (59)and (60):

c _(2LPF)(m)=β₇ c _(2LPF)(m−1)+(1−β₇)ĉ _(2LPF)(m)   (59)

c _(2HPF)(m)=β₈ c _(2HPF)(m−1)+(1−β₈)ĉ _(2HPF)(m)   (60)

β₇ and β₈ are averaging constants that may take values between 0 and 1.The higher the values of β₇ and β₈ are, the smoother the averagingprocess will be, and the lower the variance of the estimates will be.Typically, values in the range: 0.7-0.8 may be used. The calibratednoise reference signal 952 may be the summation of the two scaledsub-bands of the refined noise reference signal 922 and may bedesignated z_(nf)(n).

FIG. 10 is a block diagram illustrating some aspects of one possibleconfiguration of a beamformer 1054. This beamformer 1054 may be utilizedas the second beamformer 754 discussed earlier.

The primary purpose of secondary beamforming may be to utilize thecalibrated refined noise reference signal 1052 and remove more noisefrom the desired audio reference signal 1016. The input to the adaptivefilter 1084 may be chosen to be the calibrated refined noise referencesignal 1052. The input signal may be optionally low-pass filtered by theLPF 1080 in order to prevent the beamformer 1054 from aggressivelysuppressing high-frequency content in the desired audio reference signal1016. Low-pass filtering the input may help ensure that the seconddesired audio signal 1056 of the beamformer 1054 does not sound muffled.An Infinite Impulse Response (IIR) or Finite Impulse Response (FIR)filter with a 2800-3500 Hz cut-off frequency for an 8 KHz sampling ratef_(s) may be used for low-pass filtering the calibrated refined noisereference signal 1052. The cut-off frequency may be doubled if thesampling rate f_(s) is doubled.

The calibrated refined noise reference signal 1052 may be designatedz_(nf)(n). The LPF 1080 may be designated h_(LPF)(n). The low-passfiltered, calibrated, refined noise reference signal 1082 may bedesignated z_(j)(n). The output 1086 of the adaptive filter 1084 may bedesignated z_(w2)(n). The adaptive filter weights may be designatedw₂(i), and may be updated using any adaptive filtering technique knownin the art (e.g., LMS, NLMS, etc.). The desired audio reference signal1016 may be designated z_(b1)(n). The second desired audio signal 1056may be designated z_(sf)(n). The beamformer 1054 may be configured toimplement a beamforming process as expressed in equations (61), (62),and (63):

$\begin{matrix}{{z_{j}(n)} = {{z_{nf}(n)}*{h_{LPF}(n)}}} & (61) \\{{z_{w\; 2}(n)} = {\sum\limits_{i = 0}^{M - 1}{{w_{2}(i)}{z_{j}\left( {n - i} \right)}}}} & (62) \\{{z_{sf}(n)} = {{z_{b\; 1}(n)} - {z_{w\; 2}(n)}}} & (63)\end{matrix}$

Although not shown in FIG. 10, the calibrated, refined noise referencesignal 1052, the low-pass filtered, calibrated, refined noise referencesignal 1082, and/or the output 1086 of the adaptive filter 1084 may alsobe passed through to a post processing block (e.g., the post-processingblock 760).

FIG. 11 is a block diagram illustrating some aspects of one possibleconfiguration of a post-processing block 1160. Post-processingtechniques may be used for removing additional residual noise from thesecond desired audio signal 1156. Post-processing methods such asspectral subtraction, Wiener filtering, etc. may be used for suppressingfurther noise from the second desired audio signal 1156. The desiredaudio output signal 1162 may be transmitted, output through a speaker,or otherwise utilized. Any stage of the noise reference processed signal1158 may also be utilized or provided as output 1164.

FIG. 12 is a flow diagram illustrating some aspects of one possibleconfiguration of a method 1200 for suppressing ambient noise. The method1200 may be implemented by a communication device, such as a mobilephone, “land line” phone, wired headset, wireless headset, hearing aid,audio/video recording device, etc.

Desired audio signals (which may include speech 106) as well as ambientnoise (e.g., the ambient noise 108) may be received 1288 via multipletransducers (e.g., microphones 110 a, 110 b). These transducers may beclosely spaced on the communication device. These analog audio signalsmay be converted 1289 to digital audio signals (e.g., digital audiosignals 746 a, 746 b).

The digital audio signals may be calibrated 1290, such that the desiredaudio energy is balanced between the signals. Beamforming may then beperformed 1291 on the signals, which may produce at least one desiredaudio reference signal (e.g., desired audio reference signal 716) and atleast one noise reference signal (e.g., noise reference signal 718). Thenoise reference signal(s) may be refined 1292 by removing more desiredaudio from the noise reference signal(s). The noise reference signal(s)may then be calibrated 1293, such that the energy of the noise in thenoise reference signal(s) is balanced with the noise in the desiredaudio reference signal(s). Additional beamforming may be performed 1294to remove additional noise from the desired audio reference signal. Postprocessing may also be performed 1295.

The method 1200 described in FIG. 12 above may be performed by varioushardware and/or software component(s) and/or module(s) corresponding tothe means-plus-function blocks 1200 a illustrated in FIG. 12 a. In otherwords, blocks 1288 through 1295 illustrated in FIG. 12 correspond tomeans-plus-function blocks 1288 a through 1295 a illustrated in FIG. 12a.

Reference is now made to FIG. 13. FIG. 13 illustrates certain componentsthat may be included within a communication device 1302. Thecommunication device 1302 may be configured to implement the methods forsuppressing ambient noise described herein.

The communication device 1302 includes a processor 1370. The processor1370 may be a general purpose single- or multi-chip microprocessor(e.g., an ARM), a special purpose microprocessor (e.g., a digital signalprocessor (DSP)), a microcontroller, a programmable gate array, etc. Theprocessor 1370 may be referred to as a central processing unit (CPU).Although just a single processor 1370 is shown in the communicationdevice 1302 of FIG. 13, in an alternative configuration, a combinationof processors (e.g., an ARM and DSP) could be used.

The communication device 1302 also includes memory 1372. The memory 1372may be any electronic component capable of storing electronicinformation. The memory 1372 may be embodied as random access memory(RAM), read only memory (ROM), magnetic disk storage media, opticalstorage media, flash memory devices in RAM, on-board memory includedwith the processor, EPROM memory, EEPROM memory, registers, and soforth, including combinations thereof.

Data 1374 and instructions 1376 may be stored in the memory 1372. Theinstructions 1376 may be executable by the processor 1370 to implementthe methods disclosed herein. Executing the instructions 1376 mayinvolve the use of the data 1374 that is stored in the memory 1372.

The communication device 1302 may also include multiple microphones 1310a, 1310 b, 1310 n. The microphones 1310 a, 1310 b, 1310 n may receiveaudio signals that include speech and ambient noise, as discussed above.The communication device 1302 may also include a speaker 1390 foroutputting audio signals.

The communication device 1302 may also include a transmitter 1378 and areceiver 1380 to allow wireless transmission and reception of signalsbetween the communication device 1302 and a remote location. Thetransmitter 1378 and receiver 1380 may be collectively referred to as atransceiver 1382. An antenna 1384 may be electrically coupled to thetransceiver 1382. The communication device 1302 may also include (notshown) multiple transmitters, multiple receivers, multiple transceiversand/or multiple antenna.

The various components of the communication device 1302 may be coupledtogether by one or more buses, which may include a power bus, a controlsignal bus, a status signal bus, a data bus, etc. For the sake ofclarity, the various buses are illustrated in FIG. 13 as a bus system1386.

In the above description, reference numbers have sometimes been used inconnection with various terms. Where a term is used in connection with areference number, this is meant to refer to a specific element that isshown in one or more of the Figures. Where a term is used without areference number, this is meant to refer generally to the term withoutlimitation to any particular Figure.

The term “determining” encompasses a wide variety of actions and,therefore, “determining” can include calculating, computing, processing,deriving, investigating, looking up (e.g., looking up in a table, adatabase or another data structure), ascertaining and the like. Also,“determining” can include receiving (e.g., receiving information),accessing (e.g., accessing data in a memory) and the like. Also,“determining” can include resolving, selecting, choosing, establishingand the like.

The phrase “based on” does not mean “based only on,” unless expresslyspecified otherwise. In other words, the phrase “based on” describesboth “based only on” and “based at least on.”

The term “processor” should be interpreted broadly to encompass ageneral purpose processor, a central processing unit (CPU), amicroprocessor, a digital signal processor (DSP), a controller, amicrocontroller, a state machine, and so forth. Under somecircumstances, a “processor” may refer to an application specificintegrated circuit (ASIC), a programmable logic device (PLD), a fieldprogrammable gate array (FPGA), etc. The term “processor” may refer to acombination of processing devices, e.g., a combination of a DSP and amicroprocessor, a plurality of microprocessors, one or moremicroprocessors in conjunction with a DSP core, or any other suchconfiguration.

The term “memory” should be interpreted broadly to encompass anyelectronic component capable of storing electronic information. The termmemory may refer to various types of processor-readable media such asrandom access memory (RAM), read-only memory (ROM), non-volatile randomaccess memory (NVRAM), programmable read-only memory (PROM), erasableprogrammable read only memory (EPROM), electrically erasable PROM(EEPROM), flash memory, magnetic or optical data storage, registers,etc. Memory is said to be in electronic communication with a processorif the processor can read information from and/or write information tothe memory. Memory that is integral to a processor is in electroniccommunication with the processor.

The terms “instructions” and “code” should be interpreted broadly toinclude any type of computer-readable statement(s). For example, theterms “instructions” and “code” may refer to one or more programs,routines, sub-routines, functions, procedures, etc. “Instructions” and“code” may comprise a single computer-readable statement or manycomputer-readable statements. The terms “instructions” and “code” may beused interchangeably herein.

The functions described herein may be implemented in hardware, software,firmware, or any combination thereof. If implemented in software, thefunctions may be stored as one or more instructions on acomputer-readable medium. The term “computer-readable medium” refers toany available medium that can be accessed by a computer. By way ofexample, and not limitation, a computer-readable medium may compriseRAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic diskstorage or other magnetic storage devices, or any other medium that canbe used to carry or store desired program code in the form ofinstructions or data structures and that can be accessed by a computer.Disk and disc, as used herein, includes compact disc (CD), laser disc,optical disc, digital versatile disc (DVD), floppy disk and Blu-ray®disc where disks usually reproduce data magnetically, while discsreproduce data optically with lasers.

Software or instructions may also be transmitted over a transmissionmedium. For example, if the software is transmitted from a website,server, or other remote source using a coaxial cable, fiber optic cable,twisted pair, digital subscriber line (DSL), or wireless technologiessuch as infrared, radio, and microwave, then the coaxial cable, fiberoptic cable, twisted pair, DSL, or wireless technologies such asinfrared, radio, and microwave are included in the definition oftransmission medium.

The methods disclosed herein comprise one or more steps or actions forachieving the described method. The method steps and/or actions may beinterchanged with one another without departing from the scope of theclaims. In other words, unless a specific order of steps or actions isrequired for proper operation of the method that is being described, theorder and/or use of specific steps and/or actions may be modifiedwithout departing from the scope of the claims.

Further, it should be appreciated that modules and/or other appropriatemeans for performing the methods and techniques described herein, suchas those illustrated by FIGS. 6 and 12, can be downloaded and/orotherwise obtained by a device. For example, a device may be coupled toa server to facilitate the transfer of means for performing the methodsdescribed herein. Alternatively, various methods described herein can beprovided via a storage means (e.g., random access memory (RAM), readonly memory (ROM), a physical storage medium such as a compact disc (CD)or floppy disk, etc.), such that a device may obtain the various methodsupon coupling or providing the storage means to the device. Moreover,any other suitable technique for providing the methods and techniquesdescribed herein to a device can be utilized.

It is to be understood that the claims are not limited to the preciseconfiguration and components illustrated above. Various modifications,changes and variations may be made in the arrangement, operation anddetails of the systems, methods, and apparatus described herein withoutdeparting from the scope of the claims.

1. A method for suppressing ambient noise using multiple audio signals,comprising: providing at least two audio signals by at least twoelectro-acoustic transducers, wherein the at least two audio signalscomprise desired audio and ambient noise; performing beamforming on theat least two audio signals in order to obtain a desired audio referencesignal that is separate from a noise reference signal; and refining thenoise reference signal by removing residual desired audio from the noisereference signal, thereby obtaining a refined noise reference signal. 2.The method of claim 1, wherein the residual desired audio ishigh-frequency residual desired audio.
 3. The method of claim 1, whereinthe method is implemented by a communication device, and wherein thedesired audio comprises speech.
 4. The method of claim 1, wherein the atleast two electro-acoustic transducers are microphones.
 5. The method ofclaim 1, further comprising calibrating the at least two signals inorder to balance desired audio energy between the at least two signals.6. The method of claim 1, further comprising calibrating the refinednoise reference signal to compensate for attenuation effects caused bythe beamforming.
 7. The method of claim 6, wherein calibrating therefined noise reference signal comprises: filtering the refined noisereference signal in order to obtain at least two sub-bands; calculatingcalibration factors, a separate calibration factor being calculated foreach sub-band; calibrating the sub-bands by multiplying the sub-bands bythe calibration factors; and summing the calibrated sub-bands.
 8. Themethod of claim 1, wherein the beamforming comprises fixed beamforming.9. The method of claim 1, wherein the beamforming comprises adaptivebeamforming.
 10. The method of claim 1, further comprising performingadditional beamforming to remove additional noise from the desired audioreference signal.
 11. The method of claim 10, wherein performingadditional beamforming comprises: low-pass filtering a calibrated,refined noise reference signal; and performing adaptive filtering on thelow-pass filtered, calibrated, refined noise reference signal.
 12. Anapparatus for suppressing ambient noise using multiple audio signals,comprising: at least two electro-acoustic transducers that provide atleast two audio signals comprising desired audio and ambient noise; abeamformer that performs beamforming on the at least two audio signalsin order to obtain a desired audio reference signal that is separatefrom a noise reference signal; and a noise reference refiner thatrefines the noise reference signal by removing residual desired audiofrom the noise reference signal, thereby obtaining a refined noisereference signal.
 13. The apparatus of claim 12, wherein the residualdesired audio is high-frequency residual desired audio.
 14. Theapparatus of claim 12, wherein the apparatus is a communication device,and wherein the desired audio comprises speech.
 15. The apparatus ofclaim 12, wherein the at least two electro-acoustic transducers aremicrophones.
 16. The apparatus of claim 12, further comprising acalibrator that calibrates the at least two signals in order to balancedesired audio energy between the at least two signals.
 17. The apparatusof claim 12, further comprising a noise reference calibrator thatcalibrates the refined noise reference signal to compensate forattenuation effects caused by the beamforming.
 18. The apparatus ofclaim 17, wherein the noise reference calibrator comprises: at least twofilters that filter the refined noise reference signal in order toobtain at least two sub-bands; a calibration unit that calculatescalibration factors, a separate calibration factor being calculated foreach sub-band; at least two multipliers that calibrate the sub-bands bymultiplying the sub-bands by the calibration factors; and an adder thatsums the calibrated sub-bands.
 19. The apparatus of claim 12, whereinthe beamformer is a fixed beamformer.
 20. The apparatus of claim 12,wherein the beamformer is an adaptive beamformer.
 21. The apparatus ofclaim 12, further comprising a second beamformer that performsadditional beamforming to remove additional noise from the desired audioreference signal.
 22. The apparatus of claim 21, wherein the secondbeamformer comprises: a low-pass filter that performs low-pass filteringon a calibrated, refined noise reference signal; and an adaptive filterthat performs adaptive filtering on the low-pass filtered, calibrated,refined noise reference signal.
 23. An apparatus for suppressing ambientnoise using multiple audio signals, comprising: means for providing atleast two audio signals by at least two electro-acoustic transducers,wherein the at least two audio signals comprise desired audio andambient noise; means for performing beamforming on the at least twoaudio signals in order to obtain a desired audio reference signal thatis separate from a noise reference signal; and means for refining thenoise reference signal by removing residual desired audio from the noisereference signal, thereby obtaining a refined noise reference signal.24. The apparatus of claim 23, wherein the residual desired audio ishigh-frequency residual desired audio.
 25. The apparatus of claim 23,further comprising means for calibrating the at least two signals inorder to balance desired audio energy between the at least two signals.26. The apparatus of claim 23, further comprising means for calibratingthe refined noise reference signal to compensate for attenuation effectscaused by the beamforming.
 27. The apparatus of claim 26, wherein themeans for calibrating the refined noise reference signal comprises:means for filtering the refined noise reference signal in order toobtain at least two sub-bands; means for calculating calibrationfactors, a separate calibration factor being calculated for eachsub-band; means for calibrating the sub-bands by multiplying thesub-bands by the calibration factors; and means for summing thecalibrated sub-bands.
 28. The apparatus of claim 23, further comprisingmeans for performing additional beamforming to remove additional noisefrom the desired audio reference signal, the means for performingadditional beamforming comprising: means for low-pass filtering acalibrated, refined noise reference signal, thereby obtaining a low-passfiltered, calibrated, refined noise reference signal; and means forperforming adaptive filtering on the low-pass filtered, calibrated,refined noise reference signal.
 29. A computer-program product forsuppressing ambient noise using multiple audio signals, thecomputer-program product comprising a computer-readable medium havinginstructions thereon, the instructions comprising: code for providing atleast two audio signals by at least two electro-acoustic transducers,wherein the at least two audio signals comprise desired audio andambient noise; code for performing beamforming on the at least two audiosignals in order to obtain a desired audio reference signal that isseparate from a noise reference signal; and code for refining the noisereference signal by removing residual desired audio from the noisereference signal, thereby obtaining a refined noise reference signal.30. The computer-program product of claim 29, wherein the residualdesired audio is high-frequency residual desired audio.
 31. Thecomputer-program product of claim 29, further comprising code forcalibrating the at least two signals in order to balance desired audioenergy between the at least two signals.
 32. The computer-programproduct of claim 29, further comprising code for calibrating the refinednoise reference signal to compensate for attenuation effects caused bythe beamforming.
 33. The computer-program product of claim 32, whereinthe code for calibrating the refined noise reference signal comprises:code for filtering the refined noise reference signal in order to obtainat least two sub-bands; code for calculating calibration factors, aseparate calibration factor being calculated for each sub-band; code forcalibrating the sub-bands by multiplying the sub-bands by thecalibration factors; and code for summing the calibrated sub-bands. 34.The computer-program product of claim 29, further comprising code forperforming additional beamforming to remove additional noise from thedesired audio reference signal, the code for performing additionalbeamforming comprising: code for low-pass filtering a calibrated,refined noise reference signal, thereby obtaining a low-pass filtered,calibrated, refined noise reference signal; and code for performingadaptive filtering on the low-pass filtered, calibrated, refined noisereference signal.